Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
PD,PD,PD,是一种影响运动技能的慢性病,​​包括震颤和刚度等症状。目前的诊断程序使用患者评估来评估症状,有时是磁共振成像或MRI扫描。然而,症状变异导致评估不准确,MRI扫描的分析需要经验丰富的专家。本研究建议通过将症状数据和MRI数据与Parkinsons进展标记倡议数据库组合来准确地诊断PD严重程度。实施了一种新的混合模型架构,以充分利用两种形式的临床数据,以及基于仅症状的模型,并且还开发了MRI扫描。基于症状的模型集成了完全连接的深度学习神经网络,MRI扫描与混合模型集成了基于转移学习的卷积神经网络。所有型号诊断患者诊断为五个严重性类别,而不是表现为五个严重性类别,而是代表患者的阶段和阶段4和五个代表PD患者。仅症状,仅限MRI扫描,以及分别达到0.77,0.68和0.94的精度。混合模型还具有高精度,召回评估分数为0.94和0.95。真正的临床病例确认了杂种的强烈性能,其中患者用两种其他模型进行错误分类,但通过混合动力正确地进行分类。它在五个严重性阶段也一致,表明早期检测准确。这是第一个将症状数据和MRI扫描在这种大规模上与机器学习方法结合的报告。
translated by 谷歌翻译
Recent work has reported that AI classifiers trained on audio recordings can accurately predict severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infection status. Here, we undertake a large scale study of audio-based deep learning classifiers, as part of the UK governments pandemic response. We collect and analyse a dataset of audio recordings from 67,842 individuals with linked metadata, including reverse transcription polymerase chain reaction (PCR) test outcomes, of whom 23,514 tested positive for SARS CoV 2. Subjects were recruited via the UK governments National Health Service Test-and-Trace programme and the REal-time Assessment of Community Transmission (REACT) randomised surveillance survey. In an unadjusted analysis of our dataset AI classifiers predict SARS-CoV-2 infection status with high accuracy (Receiver Operating Characteristic Area Under the Curve (ROCAUC) 0.846 [0.838, 0.854]) consistent with the findings of previous studies. However, after matching on measured confounders, such as age, gender, and self reported symptoms, our classifiers performance is much weaker (ROC-AUC 0.619 [0.594, 0.644]). Upon quantifying the utility of audio based classifiers in practical settings, we find them to be outperformed by simple predictive scores based on user reported symptoms.
translated by 谷歌翻译
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings. However, existing studies have limitations in terms of data collection and of the assessment of the performances of the proposed predictive models. This paper rigorously assesses state-of-the-art machine learning techniques used to predict COVID-19 infection status based on vocal audio signals, using a dataset collected by the UK Health Security Agency. This dataset includes acoustic recordings and extensive study participant meta-data. We provide guidelines on testing the performance of methods to classify COVID-19 infection status based on acoustic features and we discuss how these can be extended more generally to the development and assessment of predictive methods based on public health datasets.
translated by 谷歌翻译
The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.
translated by 谷歌翻译
Self-supervised image denoising techniques emerged as convenient methods that allow training denoising models without requiring ground-truth noise-free data. Existing methods usually optimize loss metrics that are calculated from multiple noisy realizations of similar images, e.g., from neighboring tomographic slices. However, those approaches fail to utilize the multiple contrasts that are routinely acquired in medical imaging modalities like MRI or dual-energy CT. In this work, we propose the new self-supervised training scheme Noise2Contrast that combines information from multiple measured image contrasts to train a denoising model. We stack denoising with domain-transfer operators to utilize the independent noise realizations of different image contrasts to derive a self-supervised loss. The trained denoising operator achieves convincing quantitative and qualitative results, outperforming state-of-the-art self-supervised methods by 4.7-11.0%/4.8-7.3% (PSNR/SSIM) on brain MRI data and by 43.6-50.5%/57.1-77.1% (PSNR/SSIM) on dual-energy CT X-ray microscopy data with respect to the noisy baseline. Our experiments on different real measured data sets indicate that Noise2Contrast training generalizes to other multi-contrast imaging modalities.
translated by 谷歌翻译
从教育和研究的角度来看,关于硬件的实验是机器人技术和控制的关键方面。在过去的十年中,已经介绍了许多用于车轮机器人的开源硬件和软件框架,主要采用独轮车和类似汽车的机器人的形式,目的是使更广泛的受众访问机器人并支持控制系统开发。独轮车通常很小且便宜,因此有助于在较大的机队中进行实验,但它们不适合高速运动。类似汽车的机器人更敏捷,但通常更大且更昂贵,因此需要更多的空间和金钱资源。为了弥合这一差距,我们介绍了Chronos,这是一种具有定制开源电子设备的新型汽车的1/28比例机器人,以及CRS是用于控制和机器人技术的开源软件框架。 CRS软件框架包括实施各种最新的算法,以进行控制,估计和多机构协调。通过这项工作,我们旨在更轻松地使用硬件,并减少启动新的教育和研究项目所需的工程时间。
translated by 谷歌翻译
操作耀斑的预测旨在提供预测,这些预测通常每天在日常规模上做出有关耀斑发生的太空天气影响的决策。这项研究表明,基于视频的深度学习可用于操作目的,当在考虑太阳周期的周期性时生成网络优化的培训和验证集。具体而言,本文描述了一种算法,该算法可用于建立根据与特定周期相关的耀斑类速率平衡的活动区域集。这些集合用于训练和验证由卷积神经网络和长短记忆网络组合组合的长期卷积网络。在两个预测窗口中,分别包含2015年3月和2017年9月的太阳风暴,评估了这种方法的可靠性。
translated by 谷歌翻译
随着非二元人在西方社会的关注越来越多,性别对语言的策略开始摆脱二进制(仅女性/男性)性别概念。然而,到目前为止,几乎没有任何将这些身份考虑到机器翻译模型中的方法。缺乏对此类技术的社会技术意义的理解,可能会进一步再现压迫和贴标记的语言机制。在本文中,我们描述了关于性别对语言和语言技术研讨会的方法和结果,该研讨会由Tu Wien,St.P \“ Olten UAS,FH UAS,FH校园Wien和Vienna大学的十位研究人员领导和组织并于2021年秋季在维也纳举行。邀请了广泛的利益集团及其代表确保可以整体处理该主题。因此,我们的目的是包括翻译人员,机器翻译专家和非二元个人(如社区专家”)在平等的基础上。我们的分析表明,机器翻译中的性别需要高度的上下文敏感性,因此,这种技术的开发人员需要在仍在社会谈判中的过程中谨慎地定位自己,并且灵活的方法似乎最适合目前。然后,我们说明了从性别面对语言技术领域的结果遵循的步骤,以便技术发展可以充分地排列U P具有社会进步。 - [德语摘要由Arxiv Admins手动添加]
translated by 谷歌翻译
蛋白质 - 蛋白质相互作用(PPI)网络由生物体的蛋白质之间的物理和/或功能相互作用组成。由于用于形成PPI网络的生物物理和高通量方法是昂贵的,耗时的,而且通常包含不准确性,因此最终的网络通常不完整。为了推断这些网络中缺少的相互作用,我们提出了基于连续的经典和量子随机步行的新型链接预测方法。在量子步行的情况下,我们检查了网络邻接和拉普拉斯矩阵的用法来控制步行动力学。我们根据相应的过渡概率定义得分函数,并在四个现实世界PPI数据集上执行测试。我们的结果表明,使用网络邻接矩阵的连续时间经典随机步行和量子步行可以成功预测缺失的蛋白质 - 蛋白质相互作用,并且性能与艺术的状态媲美。
translated by 谷歌翻译